Detecting Wikipedia Vandalism using WikiTrust - Lab Report for PAN at CLEF 2010

نویسندگان

  • B. Thomas Adler
  • Luca de Alfaro
  • Ian Pye
چکیده

WikiTrust is a reputation system for Wikipedia authors and content. WikiTrust computes three main quantities: edit quality, author reputation, and content reputation. The edit quality measures how well each edit, that is, each change introduced in a revision, is preserved in subsequent revisions. Authors who perform good quality edits gain reputation, and text which is revised by several high-reputation authors gains reputation. Since vandalism on the Wikipedia is usually performed by anonymous or new users (not least because long-time vandals end up banned), and is usually reverted in a reasonably short span of time, edit quality, author reputation, and content reputation are obvious candidates as features to identify vandalism on the Wikipedia. Indeed, using the full set of features computed by WikiTrust, we have been able to construct classifiers that identify vandalism with a recall of 83.5%, a precision of 48.5%, and a false positive rate of 8%, for an area under the ROC curve of 93.4%. If we limit ourselves to the set of features available at the time an edit is made (when the edit quality is still unknown), the classifier achieves a recall of 77.1%, a precision of 36.9%, and a false positive rate of 12.2%, for an area under the ROC curve of 90.4%. Using these classifiers, we have implemented a simple Web API that provides the vandalism estimate for every revision of the English Wikipedia. The API can be used both to identify vandalism that needs to be reverted, and to select highquality, non-vandalized recent revisions of any given Wikipedia article. These recent high-quality revisions can be included in static snapshots of the Wikipedia, or they can be used whenever tolerance to vandalism is low (as in a school setting, or whenever the material is widely disseminated).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Wikipedia Vandalism Detection Through Machine Learning: Feature Review and New Proposals - Lab Report for PAN at CLEF 2010

Wikipedia is an online encyclopedia that anyone can edit. In this open model, some people edits with the intent of harming the integrity of Wikipedia. This is known as vandalism. We extend the framework presented in (Potthast, Stein, and Gerling, 2008) for Wikipedia vandalism detection. In this approach, several vandalism indicating features are extracted from edits in a vandalism corpus and ar...

متن کامل

Wikipedia Vandalism Detection Through Machine Learning : Feature Review and New Proposals ∗ Lab Report for PAN at CLEF 2010

Wikipedia is an online encyclopedia that anyone can edit. In this open model, some people edits with the intent of harming the integrity of Wikipedia. This is known as vandalism. We extend the framework presented in (Potthast, Stein, and Gerling, 2008) for Wikipedia vandalism detection. In this approach, several vandalism indicating features are extracted from edits in a vandalism corpus and ar...

متن کامل

Detecting Wikipedia Vandalism using WikiTrust

WikiTrust is a reputation system for Wikipedia authors and content. WikiTrust computes three main quantities: edit quality, author reputation, and content reputation. The edit quality measures how well each edit, that is, each change introduced in a revision, is preserved in subsequent revisions. Authors who perform good quality edits gain reputation, and text which is revised by several high-r...

متن کامل

Novel Balanced Feature Representation for Wikipedia Vandalism Detection Task - Lab Report for PAN at CLEF 2010

In online communities, like Wikipedia, where content edition is available for every visitor users who deliberately make incorrect, vandal comments are sure to turn up. In this paper we propose a strong feature set and a method that can handle this problem and automatically decide whether an edit is a vandal contribution or not. We present a new feature set that is a balanced and extended versio...

متن کامل

Wiki Vandalysis - Wikipedia Vandalism Analysis - Lab Report for PAN at CLEF 2010

Wikipedia describes itself as the “free encyclopedia that anyone can edit”. Along with the helpful volunteers who contribute by improving the articles, a great number of malicious users abuse the open nature of Wikipedia by vandalizing articles. Deterring and reverting vandalism has become one of the major challenges of Wikipedia as its size grows. Wikipedia editors fight vandalism both manuall...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010